A Novel Metadata Based Multi-Label Document Classification Technique

نویسندگان

چکیده

From the beginning, process of research and its publication is an ever-growing phenomenon with emergence web technologies, growth rate overwhelming. On a rough estimate, more than thirty thousand journals have been issuing around four million papers annually on average. Search engines, indexing services, digital libraries searching for such publications over web. Nevertheless, getting most relevant articles against user requests yet fantasy. It mainly because are not appropriately indexed based hierarchies granular subject classification. To overcome this issue, researchers striving to investigate new techniques classification especially, when complete article text available (a case non-open access articles). The proposed study aims multilabel metadata in best possible way assess, “to what extent metadata-based features can perform contrast content-based approaches.” In regard, novel investigating proposed, developed, evaluated as Title Keywords articles. technique has assessed two diverse datasets, namely, from Journal universal computer science (J.UCS) benchmark dataset comprises published by Association computing machinery (ACM). yields encouraging results state-of-the-art literature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel Unsupervised Features for Czech Multi-label Document Classification

This paper deals with automatic multi-label document classification in the context of a real application for the Czech News Agency. The main goal of this work consists in proposing novel fully unsupervised features based on an unsupervised stemmer, Latent Dirichlet Allocation and semantic spaces (HAL and COALS). The proposed features are integrated into the document classification task. Another...

متن کامل

Word Embeddings for Multi-label Document Classification

In this paper, we analyze and evaluate word embeddings for representation of longer texts in the multi-label document classification scenario. The embeddings are used in three convolutional neural network topologies. The experiments are realized on the Czech ČTK and English Reuters-21578 standard corpora. We compare the results of word2vec static and trainable embeddings with randomly initializ...

متن کامل

Multi-label Document Classification in Czech

This paper deals with multi-label automatic document classification in the context of a real application for the Czech news agency. The main goal of this work is to compare and evaluate three most promising multi-label document classification approaches on a Czech language. We show that the simple method based on a meta-classifier proposes by Zhu at al. outperforms significantly the other appro...

متن کامل

Boosting-based Multi-label Classification

Multi-label classification is a machine learning task that assumes that a data instance may be assigned with multiple number of class labels at the same time. Modelling of this problem has become an important research topic recently. This paper revokes AdaBoostSeq multi-label classification algorithm and examines it in order to check its robustness properties. It can be stated that AdaBoostSeq ...

متن کامل

A Multilingual Polarity Classification Method using Multi-label Classification Technique Based on Corpus Analysis

In NTCIR-7 MOAT, we participated in four sub-tasks (opinion & holder detection, relevance judg-ment, and polarity classification) at two languagesides: Japanese and English. In this paper, we fo-cused on the feature selection and polarity classifi-cation methodology in both languages. To detectopinion and classify the polarity, the features wereselected based on a st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer systems science and engineering

سال: 2023

ISSN: ['0267-6192']

DOI: https://doi.org/10.32604/csse.2023.033844